Àá½Ã¸¸ ±â´Ù·Á ÁÖ¼¼¿ä. ·ÎµùÁßÀÔ´Ï´Ù.
KMID : 0381120230450121527
Genes and Genomics
2023 Volume.45 No. 12 p.1527 ~ p.1536
A comparative investigation of single nucleotide variant calling for a personal non-Caucasian sequencing sample
Park Hyeon-Seul

Gim Jung-Soo
Abstract
Background : Dropping cost and increasing clinical application of whole genome sequencing (WGS) lead a necessity of efficient (accurate and rapid) variant calling procedures from a personal WGS data (n?=?1). A number of variant calling pipelines have been introduced utilizing the human genome reference GRCh38 as a reference and a benchmark dataset called ¡®NA12878¡¯, which are both ¡®standard¡¯ but limited ethnic origin. Considering the nature of variant calling algorithms and recent updates in sequencing protocol, however, it is necessary to revisit the efficiency of the current best pipelines for a personal WGS data from diverse ethnicity.

Objective : We discuss the most efficient practices for variant calling of a personal WGS reads, with a particular emphasis on whether (1) ethnic match or mismatch between the reference genome and a WGS data produces a distinct result and more importantly (2) there is an ethnic-specific optimal workflow.

Methods : Here, we generate an appropriate WGS data, DNA array, and sufficient number of Sanger validated variants from a single Korean subject to perform such a comprehensive comparison. We applied this WGS reads and the ¡®NA12878¡¯ reads to 8 different variant calling pipelines with 2 different reference genomes (GRCh38 and KOREF, a Korean reference genome) to which the WGS reads from different ethnic origins are aligned.

Results : We evaluated the performance of the pipelines with the matched array genotype data and Sanger sequencing validation and demonstrated that: regardless to the ethnic match/mismatch (1) Novoalign-GATK4 showed the most efficient performance with the exceptional calls in MHC region; (2) the overall performance was better with GRCh38, while a significant difference in recall was observed. In addition, we found it is largely reduced computing cost maintaining performance to remove ¡®markduplication¡¯ step with PCR-free WGS data.

Conclusion : For variant calling of a personal PCR-free WGS data, regardless of ethnicity consideration, we recommend the use of the Novoalign?+?GATK4 with GRCh38 and without ¡®markduplication¡¯.
KEYWORD
Personal variant calling, Variant calling pipeline, Non-Caucasian, Alternative genome reference, Markduplication, MHC variant calling
FullTexts / Linksout information
Listed journal information